Interactivity

The previous notebook showed all the steps required to get a Datashader rendering of your dataset, yielding raster images displayed using Jupyter's "rich display" support. However, these bare images do not show the data ranges or axis labels, making them difficult to interpret. Moreover, they are only static images, and datasets often need to be explored at multiple scales, which is much easier to do in an interactive program.

To get axes and interactivity, the images generated by Datashader need to be embedded into a plot using an external library like Matplotlib or Bokeh. As we illustrate below, the most convenient way to make Datashader plots using these libraries is via the HoloViews high-level data-science API. Plotly can also be used with Datashader, and native Datashader support for Matplotlib has been sketched but is not yet released.

In this notebook, we will first look at Datashader's native Bokeh support, because it uses the same API introduced in the previous examples. We'll start with the same example from the previous notebook:

Embedding Datashader with HoloViews

HoloViews (1.7 and later) is a high-level data analysis and visualization library that makes it simple to generate interactive Datashader-based plots. Here's an illustration of how this all fits together when using HoloViews+Bokeh:

Datashader+Holoviews+Bokeh

HoloViews offers a data-centered approach for analysis, where the same tool can be used with small data (anything that fits in a web browser's memory, which can be visualized with Bokeh directly), and large data (which is first sent through Datashader to make it tractable) and with several different plotting frontends. A developer willing to do more programming can do all the same things separately, using Bokeh, Matplotlib, and Datashader's APIs directly, but with HoloViews it is much simpler to explore and analyze data. Of course, the previous notebook showed that you can also use datashader without either any plotting library at all (the light gray pathways above), but then you wouldn't have interactivity, axes, and so on.

Most of this notebook will focus on HoloViews+Bokeh to support full interactive plots in web browsers, but we will also briefly illustrate the non-interactive HoloViews+Matplotlib approach. Let's start by importing some parts of HoloViews and setting some defaults:

In [1]:
import holoviews as hv
import holoviews.operation.datashader as hd
hd.shade.cmap=["lightblue", "darkblue"]
hv.extension("bokeh", "matplotlib") 

Next we'll start with the same example from the previous notebook:

In [2]:
import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf
from collections import OrderedDict as odict

num=100000
np.random.seed(1)

dists = {cat: pd.DataFrame(odict([('x',np.random.normal(x,s,num)), 
                                  ('y',np.random.normal(y,s,num)), 
                                  ('val',val), 
                                  ('cat',cat)]))      
         for x,  y,  s,  val, cat in 
         [(  2,  2, 0.03, 10, "d1"), 
          (  2, -2, 0.10, 20, "d2"), 
          ( -2, -2, 0.50, 30, "d3"), 
          ( -2,  2, 1.00, 40, "d4"), 
          (  0,  0, 3.00, 50, "d5")] }

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")

HoloViews+Bokeh

Rather than starting out by specifying a figure or plot, in HoloViews you specify an Element object to contain your data, such as Points for a collection of 2D x,y points. To start, let's define a Points object wrapping around a small dataframe with 10,000 random samples from the df above:

In [3]:
points = hv.Points(df.sample(10000))

points
Out[3]:

As you can see, the points object visualizes itself as a Bokeh plot, where you can already see many of the problems that motivate datashader (overplotting of points, being unable to detect the closely spaced dense collections of points shown in red above, and so on). But this visualization is just the default representation of points, using Jupyter's rich display support; the actual points object itself is merely a data container:

In [4]:
points.data.head()
Out[4]:
x y val cat
184289 2.164107 -2.038032 20 d2
8258 1.997346 1.983239 10 d1
186900 2.176841 -2.070830 20 d2
161735 1.981356 -2.084261 20 d2
149948 2.018556 -2.000011 20 d2

HoloViews+Datashader+Matplotlib

The default visualizations in HoloViews work well for small datasets, but larger ones will have overplotting issues as are already visible above, and will eventually either overwhelm the web browser (for the Bokeh frontend) or take many minutes to plot (for the Matplotlib backend). Luckily, HoloViews provides support for using Datashader to handle both of these problems:

In [5]:
hv.output(backend="matplotlib")
agg = ds.Canvas().points(df,'x','y')
hd.datashade(points)  +  hd.shade(hv.Image(agg))  +  hv.RGB(np.array(tf.shade(agg).to_pil()))
Out[5]:

Here we asked HoloViews to plot df using Datashader+Matplotlib, in three different ways:

  • A: HoloViews aggregates and shades an image directly from the points object using its own datashader support, then passes the image to Matplotlib to embed into an appropriate set of axes.
  • B: HoloViews accepts a pre-computed datashader aggregate, reads out the metadata about the plot ranges that is stored in the aggregate array, and passes it to Matplotlib for colormapping and then embedding.
  • C: HoloViews accepts a PIL image computed beforehand and passes it to Matplotlib for embedding.

As you can see, option A is the most convenient; you can simply wrap your HoloViews element with datashade and the rest will be taken care of. But if you want to have more control by computing the aggregate or the full RGB image yourself using the API from the previous notebook you are welcome to do so while using HoloViews+Matplotlib (or HoloViews+Bokeh, below) to embed the result into labelled axes.

HoloViews+Datashader+Bokeh

The Matplotlib interface only produces a static plot, i.e., a PNG or SVG image, but the Bokeh interface of HoloViews adds the dynamic zooming and panning necessary to understand datasets across scales:

In [6]:
hv.output(backend="bokeh")
hd.datashade(points)
Out[6]:

Here, hd.datashade is not just a function call; it is an "operation" that dynamically calls Datashader every time a new plot is needed by Bokeh. The above plot will automatically be interactive when using the Bokeh frontend to HoloViews, and Datashader will be called on each zoom or pan event if you are running a live notebook. Note that you'll only see an updated image on zooming in if there is a live Python process running.

Whatever data has been given to the browser can be viewed interactively, but in this case only a single image of the data is given at a time, and so you will not be able to see more detail when zooming in unless the Python (and thus Datashader) process is running. In a static HTML export of this notebook, such as those on a website, you'll only see the original pixels getting larger, not a zoomed-in rendering as in the callback plots above.

If you are running a live process, you can experiment with the interactivity yourself. You can zoom in using a scroll wheel (as long as the "wheel zoom" tool is enabled on the right) or pan by clicking and dragging (as long as the "pan" tool is enabled on the right). Each time you zoom or pan, the callback will be given the new viewport that's now visible, and datashader will render a new image to update the display. The result makes it look as if all of the data is available in the web browser interactively, while only ever storing a single image at any one time. In this way, full interactivity can be provided even for data that is far too large to display in a web browser directly. (Most web browsers can handle tens of thousands or hundreds of thousands of data points, but not millions or billions!)

Interactive visualization with spread

One advantage when using HoloViews operations is that you can chain them to make expressions for complex interactive visualizations. For instance, here is an interactive version of the plots showing the spread transformation shown at the end of the previous notebook:

In [7]:
datashaded = hd.datashade(points, aggregator=ds.count_cat('cat')).redim.range(x=(-5,5),y=(-5,5))
hd.dynspread(datashaded, threshold=0.50, how='over').opts(height=500,width=500)
Out[7]:

You can read more about HoloViews support for Datashader at holoviews.org.

HoloViews+Datashader+Bokeh Legends

Because the underlying plotting library only ever sees an image when using Datashader, providing legends and keys has to be handled separately from any underlying support for those features in the plotting library. We are working to simplify this process, but for now you can show a categorical legend by adding a suitable collection of labeled dummy points:

In [8]:
from datashader.colors import Sets1to3

datashaded  = hd.datashade(points, aggregator=ds.count_cat('cat'), color_key=Sets1to3)
gaussspread = hd.dynspread(datashaded, threshold=0.50, how='over').opts(plot=dict(height=400,width=400))

color_key = [(name,color) for name,color in zip(["d1","d2","d3","d4","d5"], Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(style=dict(color=c)) for n,c in color_key})

color_points * gaussspread
Out[8]:

HoloViews+Datashader+Bokeh Hover info

As you can see, converting the data to an image using Datashader makes it feasible to work with even very large datasets interactively. One unfortunate side effect is that the original datapoints and line segments can no longer be used to support "tooltips" or "hover" information directly; that data simply is not present at the browser level, and so the browser cannot unambiguously report information about any specific datapoint. Luckily, you can still provide hover information that reports properties of a subset of the data in a separate layer, you can provide information for a spatial region of the plot rather than for specific datapoints, or you can provide hover per pixel if you let Bokeh do the colormapping for you. As an example of explicitly constructing hover information, here let's calculate the point counts for each small square region:

In [9]:
from holoviews.streams import RangeXY

pts = hd.datashade(points, width=400, height=400)

quadmesh =  hv.QuadMesh(hd.aggregate(points, width=10, height=10, dynamic=False)) \
            .opts(tools=['hover'], alpha=0, hover_alpha=0.2)

dynamic = hv.util.Dynamic(hd.aggregate(points, width=10, height=10, streams=[RangeXY]), 
                          operation=hv.QuadMesh) \
          .opts(tools=['hover'], alpha=0, hover_alpha=0.2)

(pts * quadmesh).relabel("Fixed hover") + (pts * dynamic).relabel("Dynamic hover")
Out[9]:

In the above examples, the plot on the left provides hover information at a fixed spatial scale, while the one on the right reports on an area that scales with the zoom level so that arbitrarily small regions of data space can be examined, which is generally more useful.

As you can see, HoloViews makes it just about as simple to work with Datashader-based plots as regular Bokeh plots (at least if you don't need hover or color keys!), letting you visualize data of any size interactively in a browser using just a few lines of code. Because Datashader-based HoloViews plots are just one or two extra steps added on to regular HoloViews plots, they support all of the same features as regular HoloViews objects, and can freely be laid out, overlaid, and nested together with them. See holoviews.org for examples and documentation for how to control the appearance of these plots and how to work with them in general.

HoloViews+Datashader+Panel

To interactively explore data in a dashboard, you can combine Panel with HoloViews and Datashader to create an interactive visualization that allows you to toggle aggregation methods, edit colormaps, and generally interact with the data through the use of widgets (50 lines of code):

Datashader+Holoviews+Panel


Right click to download this notebook from GitHub.